Mining Block I/O Traces for Cache Preloading with Sparse Temporal Non-parametric Mixture of Multivariate Poisson
نویسندگان
چکیده
Existing caching strategies, in the storage do-main, though well suited to exploit short range spatio-temporal patterns, are unable to leverage long-rangemotifs for improving hitrates. Motivated by this,we investigate novel Bayesian non-parametric model-ing(BNP) techniques for count vectors, to capture longrange correlations for cache preloading, by mining BlockI/O traces. Such traces comprise of a sequence ofmemory accesses that can be aggregated into high-dimensional sparse correlated count vector sequences.While there are several state of the art BNP algo-rithms for clustering and their temporal extensions forprediction, there has been no work on exploring thesefor correlated count vectors. Our first contribution ad-dresses this gap by proposing a DP based mixture modelof Multivariate Poisson (DP-MMVP) and its temporalextension(HMM-DP-MMVP) that captures the full co-variance structure of multivariate count data. However,modeling full covariance structure for count vectors iscomputationally expensive, particularly for high dimen-sional data. Hence, we exploit sparsity in our countvectors, and as our main contribution, introduce theSparse DP mixture of multivariate Poisson(Sparse-DP-MMVP), generalizing our DP-MMVP mixture model,also leading to more efficient inference. We then discussa temporal extension to our model for cache preloading.We take the first step towards mining historicaldata, to capture long range patterns in storage traces forcache preloading. Experimentally, we show a dramaticimprovement in hitrates on benchmark traces and laythe groundwork for further research in storage domainto reduce latencies using data mining techniques tocapture long range motifs.
منابع مشابه
Assessment of an ore body internal dilution based on multivariate geostatistical simulation using exploratory drill hole data
Dilution can best be defined as the proportion of waste tonnage to the total weight of ore and waste in each block. Predicting the internal dilution based on geological boundaries of waste and ore in each block can help engineers to develop more reliable long-term planning designs in mining activities. This paper presents a method to calculate the geological internal dilution in each block and ...
متن کاملAdvanced mixtures for complex high dimensional data: from model-based to Bayesian non-parametric inference
Cluster analysis of complex data is an essential task in statistics and machine learning. One of the most popular approaches in cluster analysis is the one based on mixture models. It includes mixture-model based clustering to partition individuals or possibly variables into groups, block mixture-model based clustering to simultaneously associate individuals and variables to clusters, that is c...
متن کاملLocal multivariate outliers as geochemical anomaly halos indicators, a case study: Hamich area, Southern Khorasan, Iran
Anomaly recognition has always been a prominent subject in preliminary geochemical explorations. Among the regional geochemical data processing, there are a range of statistical and data mining techniques as well as different mapping methods, which serve as presentations of the outputs. The outlier’s values are of interest in the investigations where data are gathered under controlled condition...
متن کاملSemi-parametric modeling of excesses above high multivariate thresholds with censored data
How to include censored data in a statistical analysis is a recurrent issue in statistics. In multivariate extremes, the dependence structure of large observations can be characterized in terms of a non parametric angular measure, while marginal excesses above asymptotically large thresholds have a parametric distribution. In this work, a flexible semi-parametric Dirichlet mixture model for ang...
متن کاملDeconstructing on-board disk cache by using block-level real traces
On-board disk cache is an effective approach to improve disk performance by reducing the number of physical accesses to the magnetic media. Disk drive manufacturers are increasing the on-board disk cache size to match the capacity growth of the backend magnetic media. Some disk drives nowadays have a cache of 32 MB. Modern computer systems use large amounts of memory to improve performance, any...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015